The American National Corpus: A Standardized Resource for American English
نویسندگان
چکیده
Linguistic research has become heavily reliant on text corpora over the past ten years. Such resources are becoming increasingly available through efforts such as the Linguistic Data Consortium (LDC) in the US and the European Language Resources Association (ELRA) in Europe. However, in the main the corpora that are gathered and distributed through these and other mechanisms consist of texts which can be easily acquired and are available for re-distribution without undue problems of copyright, etc. This practice has resulted in a vast over-representation among available corpora of certain genres, in particular newspaper samples, which comprise the greatest percentage of texts currently available from, for example, the LDC, and which also dominate the training data available for speech recognition purposes. Other available corpora typically consist of technical reports, transcriptions of parliamentary and other proceedings, short telephone conversations, and the like. The upshot of this is that corpusbased natural language processing has relied heavily on language samples representative of usage in a handful of limited and linguistically specialized domains.
منابع مشابه
The American National Corpus: More Than the Web Can Provide
The American National Corpus (ANC) project is developing a corpus comparable to the British National Corpus (BNC), covering American English. Recent interest in the web as a source of corpus materials has caused some in the language processing community to suggest that the development of a corpus of American English is unnecessary. However, we argue that far from being rendered superfluous by t...
متن کاملA Corpus-Based Contrastive Analysis of Stance Strategies in Native and Nonnative Speakers’ English Academic Writings: Introduction and Discussion Sections in Focus
The present study was an attempt to illustrate the interaction between writers and readers. Conveying of the writers’ voice, stance, and interaction with reader was put forward within this paradigm. Being a good academic writer is highly related to the use of these strategies. Adopting a position and persuading readers of claims are very important. This study was aimed at showing th...
متن کاملAmerican Humor in Promoting the Talk over the Wall with a Focus on Robert Frost’s Poems
The Yankee is an American national phenomenon. He had leapt into national stature when slipped outside of his local character. A myth was woven around him and a cult of the Yankee developed by the permeation of the Yankee characteristics in many different characters who played tricks or told stories and entertained their audiences. The present article is an attempt to observe the Yankee myth, i...
متن کاملAssertiveness, Compliance, and Politeness: Pragmatic and Sociocultural Aspects of ‘Brazilian English’ and 'American English'
This paper showed the results of a qualitative investigation that looked into intracultural communication between Brazilian teachers and students of English, and intercultural communication between American teachers and Brazilian students of English. The aims were to identify and describe contextualization cues used by both Brazilian and American speakers of English, and to connect these cues w...
متن کاملAmerican English File Series Evaluation Based on Littlejohn’s Evaluative Framework
Textbooks play a pivotal role in language learning classrooms. The problem is, among a wide range of textbooks available, which one is more appropriate for a specific classroom and a group of learners. In order to evaluate ELT textbooks, theorists and writers have offered different kinds of evaluative frameworks based on a number of principles and criteria. This study evaluates one example of s...
متن کامل